A Tool for Efficient Content Compilation

نویسنده

  • Boris A. Galitsky
چکیده

We build a tool to assist in content creation by mining the web for information relevant to a given topic. This tool imitates the process of essay writing by humans: searching for topics on the web, selecting content fragments from the found document, and then compiling these fragments to obtain a coherent text. The process of writing starts with automated building of a table of content by obtaining the list of key entities for the given topic extracted from web resources such as Wikipedia. Once a table of content is formed, each item forms a seed for web mining. The tool builds a full-featured structured Word document with table of content, section structure, images and captions and web references for all included text fragments. Two linguistic technologies are employed: for relevance verification, we use similarity computed as a tree similarity between parse trees for a seed and candidate text fragment. For text coherence, we use a measure of agreement between a given and consecutive paragraph by tree kernel learning of their discourse trees. The tool is available at http://animatronica.io/submit.html. 1 Introducing content compilation problem In the modern society, writing and creating content is one of the most frequent human activity. An army of content creators, from students to professional writers produce various kinds of documents for various audiences. Not all of these documents are expected to be innovative, break-through or extremely important. The target of the tool being proposed is assistance with routine document creation process (Fig. 1) where most information is available on the web and needs to be collected, integrated and properly referenced. A number of content generation software systems are available in specific business domains (Johnson 2016). Most of content generation software are template-based which limits their efficiency and volume of produced content (Hendrikx et al 2015). An interesting class of content generation system is based on verbalizing some numerical data. Also, content generation for computer game support turned out to be fruitful (Liapis et al 2013). Deep-learning – based generation of a sequence of words has a limited applicability for large scale content production industrial systems. The goal of this study is to build a content compilation assistance system that would meet the following criteria: • Produces high volume cohesive text on a given topic in a domain-independent manner; • Collects text fragments from the web and combines them to assist in research on a given topic, provide systematic references; • Combines text, image and video resources in the resultant document; • Suitable for producing a final report and manual editing by students, researchers in various fields in science, engineering, business and law. On the bottom-left of Fig. 1 we show the main problem that needs to be solved to build a document from fragments collected from the web. For given two fragments, we need to determine if one can reasonably follow another in a cohesive manner. W build a discourse representation for each fragment an learn this representation to classify a pair of consecutive paragraphs as cohesive or not.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Developing and Validation of Post-Traumatic Growth Protocol for Iranian Veterans

Background and Aim: Post-traumatic growth is a new concept in psychology, which is called positive personal changes that occur after a personchr('39')s exposure to traumatic events. The purpose of this study is to Compilation and validates the post-traumatic growth protocol for veterans of war. Methods: This research was conducted in a cross-sectional descriptive study in 2018. The statistical ...

متن کامل

Discourse Community Collocations and L2 Writing Content

Taking the position that writing can be an important skill to foster knowledge building pedagogy, this article explores vocabulary as a supportive tool for this purpose. Having this in mind, a compilation of conceptually loaded vocabularies pertaining to seven discourse communities was developed, two of which were given to a group of L2 writers to investigate the implications of phraseology for...

متن کامل

مطالعه تطبیقی مراحل دانشنامه‌ نگاری در دانشنامه جهان اسلام و دایره‌‌المعارف اسلام (چاپ لیدن)

Purpose: the present paper compares two reference books the Encyclopedia World of Islam &  Encyclopedia  of Islam– Leiden  with regard to the whole process of compiling the encyclopedia, and to conduct an evaluative content analysis . Methodology: This is a comparative survey and a study of content analysis. The data collection tool in the comparative survey section is a questionnaire, and in...

متن کامل

Efficient stochastic simulation of reaction–diffusion processes via direct compilation

We present the Stochastic Simulator Compiler (SSC), a tool for exact stochastic simulations of well-mixed and spatially heterogeneous systems. SSC is the first tool to allow a readable high-level description with spatially heterogeneous simulation algorithms and complex geometries; this permits large systems to be expressed concisely. Meanwhile, direct native-code compilation allows SSC to gene...

متن کامل

Normalization and Compilation of Deductive and Object-Oriented Databases Programs for Efficient Query Evaluation

A normalization process is proposed to serve not only as a preprocessing stage for compilation and evaluation but also as a tool for classifying recursions. Then the query-independent compilation and chain-based evaluation method can be extended naturally to process a class of DOOD programs and queries. The query-independent compilation captures the bindings that could be diicult to be captured...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016